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Z^i I Abstract 

In this paper, we address the theoretical limitations in reconstructing sparse signals (in a known 
complete basis) using compressed sensing framework. We also divide the CS to non-blind and blind 
cases. Then, we compute the Bayesian Cramer-Rao bound for estimating the sparse coefficients while 
the measurement matrix elements are independent zero mean random variables. Simulation results show 
a large gap between the lower bound and the performance of the practical algorithms when the number 
^ ' of measurements are low. 

Index Terms -Compressed sensing, Sparse component analysis, Blind source separation, Cramer-Rao 
bound. 
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I. Introduction 
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Compressed Sensing or Compressive Sampling (CS) [1], [2] is an emerging field in signal processing. 

The theory of CS suggests to use only a few random linear measurement of a sparse signal (in a basis) 



for reconstructing the original signal. The mathematical model of noise free CS is: 

y = *x (l) 



X 
fa. 

where x = SI/w is the original signal with length m and is sparse in the basis \I/(i.e.,||w||o<-K" and K is 
defined as sparsity level) and $ is an n x m random measurement matrix where n < m. For near perfect 
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recovery, in addition to the signal sparsity, the incoherence of the random measurement matrix <1> with the 
basis \l/ is needed. The incoherence is satisfied with high probability for some types of random matrices 
such as i.i.d Gaussian elements or i.i.d Bernoulli ±1 elements. Recent theoretical results show that under 
these two conditions (sparsity and incoherence), the original signal can be recovered from only a few 
linear measurements of the signal within a controllable error, even in the case of noisy measurements 
[1], [2], [3], [4]. 

In [3], some error bounds are introduced for reconstructing the original sparse (or compressible) signal 
in the noisy CS framework. In [4], the performance limits of noisy CS is investigated by definition 
of some performance metrics which are of Shannon Theoretic spirit. [1] considers the no noise CS 
and finds an upper bound on reconstruction error in terms of Mean Square Error (MSE) only for i 1 - 
minimization recovery algorithm. But, [3] finds some upper bounds in the noisy CS and for general 
recovery algorithms. [4] is also investigated its own decoder which is derived based on joint typicality. 
Moreover, some information theoretic bounds are derived in [5]. 

In this paper, we derive a Bayesian Cramer-Rao Bound (BCRB) ([6], [7]), which is a lower bound, for 
noisy CS by a statistical view to the CS problem. This BCRB bounds the performance of any parametric 
estimator (whether biased or unbiased) of the sparse coefficient vector in terms of mean square estimation 
error [6], [7]. We also introduce the notion of blind CS in contrast to the traditional CS to whom we 
refer on the non-blind CS. We compute BCRB for both non-blind and blind CS, where in the latter, we 
do not know the measurement matrix in advance. In a related direction of research, a CRB is obtained 
for mixing matrix estimation in Sparse Component Analysis (SCA) [8]. 

II. Non-blind and blind noisy CS 
Consider the noisy CS problem: 

y = **w + e = Dw + e (2) 

where D = $*, w is a sparse vector and e is a Gaussian zero-mean noise vector with the covariance 
cigl. In CS framework, we want to estimate w, from which, x = \l/w can be reconstructed from the 
measurement vector y. 

We nominate the traditional CS problem as non-blind CS since we know the basis \I/ and the measure- 
ment matrix $ and hence D in advance. In some cases, we have no prior information about the signals 
in addition to their sparsity. As such, we do not know the basis \&, in which the signals are sparse. One 
application is a blind interceptor who intercepts the signals. The only information is that the signals have 
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been received are sparse in some unknown domain. In these cases, we nominate the problem as blind 
CS which is inspired from the well known problem of Blind Source Separation (BSS). As such, each 
measurement will be: 

y = <£> T *w + e = d T w + e (3) 

where <p T is the random measurement vector and a row of <1?) and d T = (p T ^f is the corresponding row 
in D and an unknown random vector. 



III. Bayesian Cramer-Rao Bound 

The Posterior Cramer-Rao Bound (PCRB) or Bayesian Cramer-Rao Bound (BCRB) of a vector of 
parameters 6 estimated from data vector y is the inverse of the Fisher information matrix, and bounds 
the estimation error in the following form [7]: 



E 



[e-e){e-e 



xT 



>J 



-1 



where 6 is the estimate of 6 and J is the Fisher information matrix with the elements [7]: 

<9 2 logp(y,0)" 



J j 



R 



y .e 



dO-idOj 



(4) 



(5) 



where p(y,0) is the joint probability between the observations and the parameters. Unlike CRB, the 
BCRB (01) is satisfied for any estimator (even for biased estimators) under some mild conditions [6], [7] 
which we assume that are fulfilled in our problem. Using Bayes rule, the Fisher information matrix can 
be decomposed into two matrices [7]: 

J = Jd+Jp, (6) 

where Jo represents data information matrix and Jp represents prior information matrix which their 
elements are [7]: 



JDi 3 — -Ey,0 



d 2 \ogp(y\6) 



de t d9j 



Ee(J s J 



JPij = Eg 



d 2 log p(6) 
dOidOj 



(7) 
(8) 



9 2 logp(y|0)-| 



where J s = E y \g[— U qq'qq^" 1 } is the standard Fisher information matrix [9] and p{6) is the prior 
distribution of the parameter vector. 

In this paper, we use this BCRB for our problem because we have a sparse prior information about 
the parameter which is estimated. We compute BCRB for two blind and non-blind cases. 
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A. Computing BCRB in non-blind CS 

In the non-blind CS case, the matrices 3> and \l/ are assumed to be known and 3> is a random matrix 
while * is a fixed basis matrix. Similar to [10], since 3> is assumed to be known and random, <& can 
be added as an additional observation. Hence, the data information matrix elements Jd. . from model ([2]) 
are of the form: 



Jd zj — ^y,w,# 



d 2 logp(y, 4>|w) 



(9) 



dwidwj 

since p(y,<&|w) = p(*)p(y|*, w), p(&) is independent of w and p(y|$,w) = (27r<7g)~ exp(^p-||y — 
Dw||D, we can write 81ogp ^' #|w) = ^(-2y T D + 2D T Dw). So, we have g]gi|&£W = J ? (y T D) i - 
^2 J2T=i 9ir w r where gij denotes the elements of the matrix G = D T D. Hence, we have — ^q^, = 
^rgij. So, the expectation © will be J D .. = -E y , w ,* -^gij = ^£^{s^} = J D%j = 4j- Ylr=i E®{d ri d r j}. 
Some simple manipulations show that under assumption that the elements of <& are zero mean and 
independent random variables, the data information matrix will be: 

2 

Jp = rA* T * (10) 

<*% 

where a 2 = E(ip 2 ;) is the variance of the random measurement matrix elements. If * is an orthonormal 

rp 0-2 

basis then \I/ \I/ = I and hence Jp = ra ^I- 

To compute the prior information matrix Jp from <[Sj>, we should assume a sparse prior distribution 

for our parameter vector elements ioj. Similarly to [11], we assume Wi's are independent and have a 

parameterized Gaussian distribution: 

1 w 2 

p(wi) = — -=exp(- — \), (11) 

aiV2ir ia i 

In (TTTb . the variance a 2 enforce the sparsity of the corresponding coefficient: a small variance means that 
the coefficient is inactive and a large value means the activity of the coefficient. It can be easily seen 
that in this case, the prior information matrix is Jp = diag(A-)- Finally, for orthonormal bases for ^ 
and for prior distribution (TTTb . the BCRB results in: 

E[(w i -tB i ) 2 ]>(n^ + ^\ . (12) 

B. Computing BCRB in blind CS 

In the blind CS case, the matrix \I/ is not known in advance and hence the elements of matrix D are 
random and unknown with zero mean. If we restrict ourselves to Gaussian measurements matrix elements 
(ipij is a zero-mean Gaussian) then different measurement samples of y are also Gaussian and independent 
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of each other. Hence, we can compute the data information matrix from only one measurement ([3]). Then, 

Vwfdw^ } wiU be ec l ual to ( refer to [ 9 ]) : 

dlogp(y\w) dlogp(y\v/) 



the information matrix elements Jjj.. = E y ^ 

J0 %] = Ey jV/ 



dwi 



dwj 



(13) 



If the elements of cp are assumed to be random with a Gaussian distribution of zero mean and variance 
a 2 and the columns of the basis matrix \I/ have unit norms, then: 



p(y\w) 



1 



; exp(- 



y 



V27rcT 2 (w) ' 2a 2 ( 
where <r 2 (w) = cr 2 + ofHwH 2 ,. Simple manipulations show: 

dlogp(y\-w) Wiol , 2 



w 



dwi 



<r 4 (w) 



KW-v* 



and from ([T3l we should compute: 



4 / WiWj 



cr 8 (w) 



a 2 (yv)-y 2 ) 2 p(y\y/)dy 



p(v/)dv/ 



(14) 



(15) 



(16) 



where the internal integral is J (a 2 (w) — y 2 ) 2 p(y\w)dy = m^ — 2er 2 (w)m2 + <7 4 (w) in which rri2 and 
m^ are the second and fourth order moments equal to mi = cr 2 (w) and m.4 = 3<r 4 (w). So, we have 
f (cr 2 (w) - y 2 ) 2 p(y\w)dy = 2cr 4 (w) and then: 



Jd„ = 2°t / ^SKw)dw 



a 4 (w) J 



(17) 



where the off diagonal terms are zeros J^. = 0,j / i because the integrand is an odd function. The 
diagonal terms are: 



Jd u = 2cr r 



w- 



((x 2 + a 2 ||w|| 2 ) 2 
Following Appendix Jl the diagonal elements are simplified as: 

J Du = ^(A 1 - a 2 A 2 ) 



p(v/)dv/ 



(18) 



(19) 



where A\ and A 2 are defined and calculated in Appendix HI 

The prior information matrix for BG distribution p(wi) = pS(wi) + (l—p) — \== exp(— ^7) is calculated 
in Appendix HT1 

I — n 

(20) 



Jp = ^I 



Finally, the Blind BCRB is calculated as: 



E {( 



Wi - Wi 



> 2 £r ( Al _ a l A2 ) + 



1 — p 



a- 



(21) 
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IV. Simulation results 

In this section, we compare the CRB's with the results of some of the state-of-the-art algorithms for 
signal reconstruction in CS. In our simulations, we used sparse signals with the length m = 512 in the 
time domain where \l> = I. We used a BG distribution with the probability of being nonzero equal to 
1 — p = 0.1 and the variance for nonzero coefficients is equal to a 2 = (0.5) 2 . So, in average there 
were 51 active coefficients. We used a Gaussian random measurement matrix with elements drawn from 
zero mean Gaussian distribution with variance equal to a 2 = 1. The number of measurements are varied 
between 60 to 200. We computed the Mean Square Error (MSE) for sparse coefficient vector over 100 
different runs of the experiment: 

( 1 10 ° \ 

MSE^101og 10 (— £||w r -w r ||!J (22) 

where r is the experiment index. We compared this measure for various algorithms with the average value 
of BCRB for non-blind case which is equal to — trace(J _1 ). The algorithms used for our simulation are 
Orthogonal Matching Pursuit (OMP) [12], Basis Pursuit (BP) [13], Bayesian Compressive Sampling 
(BCS) [14] and Smoothed-LO (SL0)U [15]. We also computed the BCRB for blind case d2fl to compare 
the BCRB's in both blind and non-blind case. Figure Q] shows the results of the simulation. It can be 
seen that in the low number of measurements, there is a gap between the BCRB and the performance of 
algorithms while one of the algorithms approximately reaches the BCRB for large number of measure- 
ments. Moreover, the difference between the BCRB's for the non-blind and blind cases are very large. 
It shows that the blind case needs much more linear measurements than the non-blind case. 

To verify the approximation D\ ~s and L>2 ~ (refer to Appendix [TT]), we calculated the integrals 
numerically with parameters p = 0.9 and a = 1. When o"o = 10 -5 then D\ = 4.7990 x 10 -25 and 
L>2 = 2.7673 x 10~ 19 . It shows that our approximations are true for sufficiently small value of gq. 

V. Conclusions 

In this paper, the CS problem is divided into non-blind and blind cases and the Bayesian Cramer-Rao 
bound for estimating the sparse vector of the signal was calculated in the two cases. The simulation 
results show a large gap between the lower bound and the performance of the practical algorithms when 

'We used the OMP code from http://sparselab.stanford.edu with 50 iterations, the BP code from 

http://www.acm.caltech.edu/llmagic/lleq-pd.rn with pdtol=le-6 and its default parameters, the BCS code from 

http://people.ee.duke.edu/~lihan/cs with its default parameters and the SL0 code from http://ee.sharif.edu/~SLzero with 
parameters sigma-min=0.001 and sigma-decrease-factor=0.9. 
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Fig. 1. MSE versus number of measurements for a sparse signal in the time domain (^ = I) with length m = 512 and with 
the BG distribution with parameters p = 0.9, o\ — 0.5 and <?2 = 0. Measurement matrix elements are unit variance Gaussian 
random variables. 



the number of measurements are low. There was also a large gap between the BCRB in both non-blind 
and blind cases. It also shows that in the blind CS framework, much more blind linear measurements of 
the sparse signal are needed for perfect recovery of the signal. 



Appendix I 
Computing the integral 

Let define Ii = J w / CT2+0 ^|| w |m2 P(w)dw and assume an equal prior distribution for all coefficients Wi, 
then all /j's are the same because of the symmetry of the integral. So, we can add all the integrals and 
write: 



ma^Ii 



at w 






p(w) 



\oj + ai w 



-dv/ 



o P 



\2! 

p(w) 



(23) 



l0i+0f||W M2 



Then, if we nominate the two above integrals as A-\ = \ , , , \,; ui , dw and Ai = / , , , \J m^ dw, 

° 1 Jw (0'e+ cr rll w ll2) Jw ( CT c + CT r 1 1 w l I2) 



'" 2 + 0-2 J 

P(w) 



2N2 



(iw 



P(w) 



the integral Ii is computed as Ii = —^ {A\ — o^A-i). To compute A\ and A%, we approximate the joint 
probability distribution of coefficients as: 



PW 






U>i 



p 



TP(o+ 



1=1 



p 



-1,-, A ^lL=Wr<H^ 

(i -p) 2^ ^f= — ( ' X P 



r=l 



0-V27T 



2^f 



(24) 
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This approximation is based on the assumption that the value of (1—p) which is the activity probability 
is very small and so we can neglect the higher order powers of (1 — p). By this approximation, the two 
integrals will be approximately: 

p m mp m ~ 1 (l—p) 

p m rnp m ~ x (\— p) 

M = — o H 7E= B 2 



a 



I <T-v/27T 



where the two integrals are B\ = J w ^ +0 ^) dw and B 2 = j w ^ +(T I^2 dw. By change of variable 
x = -^j=, the two integrals are equal to: 



Bl= ' r^^ > Cl 



V2aa 2 J-oo a 2 + x 2 ~ y/2aa 2 

B - 1 f + °° eXp( ~ x2) rlr= 1 C 

2 2V2CTW J-oo (a 2 + x 2 r aX ~2V2a^ 2 



where a 2 = 2 a^ a 2 • ^ ne arj ove integrals are equal 



A 



c 



2 



Ci = -exp(a 2 )[l-erf(a)] 

7rexp(a 2 ) [ 2 2^/tt 2 

1 - 2a z H . . - erf (a) + 2eTerf(a) 

7rexp(a^J 



2a 3 
where erf(x) is the error function, defined as erf(x) = -j= f~ exp(— £ 2 )d£. 

Appendix II 
Prior information matrix for BG distribution 

Since the coefficients Wi's are independent, the off diagonal terms Jp tj ,i / j are zero. Because of the 

independence of Wi's, we can write Jp.. = E w .{ gf^ }■ To calculate this term, we use a Gaussian 

distribution with small variance Og instead of delta function 5(wi). So, the prior is: 

2 2 

p{wi) = A exp ( - gL) + B exp ( - |^) (25) 

where ^4 = — 2^, Z? = 7^- and an — >• 0. The partial derivative can be calculated as: 

<Tov2-7r <TV27T 

9 2 logp(^j) 1 d 2 p(Wi) 1 fdp(wi)\- 

dw 2 p(wi) dw 2 p 2 (wi) V Swj / 

Hence, we have: 

Jp - = ~ / a 2 dwi + / "7 — V ~~ a dwi ( 27 ^ 

7_oo dwf J_ 00 p{wi) V dwi ) 

2 We used Maple software to compute the integrals analytically. 
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To compute the above integrals, the partial derivatives are g™ z ' = — -p 1 exp(— ^v) -^- exp(— ^ 



d 2 p(wi) __ A / w i \ \ Aw? I wl \ B ( w i \ i Bw 



and ^f 2 = -^ ex P("^) + -it exp ("^) ~ £ ex P("&) + -pt ex P("^)- Sim P le calculations 
show that j Qu]i dwi = and hence: 

Jp = / —^ —^ 3 dwi (28) 

J-oo Aexp(-2^) + 5exp(-^2) 

-/l 2 ™ 2 TO 2 

r+oo g 4 ' ex P(~i^-) 

where the above integral can be decomposed to three integrals which are D\ = b-j s — 3— ou>j, 

-ABm? ,_ w? _ ra 2 . -B 2 ™ 2 u, 2 

D 2 = r + °° ^^ 6 2 X ^ -^-dwj and £> 3 = f + °° "^t cxp( " i ^ ) 2 rf W< . Since we have a 

°° Acxp(-^)+Bcxp(-^) • °° Acxp(-^)+£cxp(-^) 

term u; 2 in the numerator of the above integrals and the Gaussian term with small variance is large near 
zero, we can neglect the Gaussian term with small variance (delta function) in the denominator. So, the 
integrals D\ and D2 with neglecting this term will be approximately zero. We verify this approximation 
in the simulation results by computing these integrals numerically. Finally, the third integral will be 

R 2 .,, 2 ,„ 2 

~ i 1 i \ 

approximately D3 « j_ °° ai a2 g dw\. Calculating this integral results is Jp u « D3 « -^f-. 
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